03. Overview of the ND Program

Overview of the ND Program

The Deep Reinforcement Learning Nanodegree program is divided into four parts, giving you a thorough understanding of deep reinforcement learning, and covering some of the major topics.

## Part 1: Foundations of Reinforcement Learning

The first part begins with a simple introduction to reinforcement learning. You'll learn how to define real-world problems as Markov Decision Processes (MDPs), so that they can be solved with reinforcement learning.

How might we use [reinforcement learning](https://arxiv.org/pdf/1803.05580.pdf) to teach a robot to walk? ([Source](https://spectrum.ieee.org/automaton/robotics/industrial-robots/agility-robotics-introduces-cassie-a-dynamic-and-talented-robot-delivery-ostrich))

How might we use reinforcement learning to teach a robot to walk? (Source)

Then, you'll implement classical methods such as SARSA and Q-learning to solve several environments in OpenAI Gym. You'll then explore how to use techniques such as tile coding and coarse coding to expand the size of the problems that can be solved with traditional reinforcement learning algorithms.

Train a car to navigate a steep hill using Q-learning.

Train a car to navigate a steep hill using Q-learning.

## Part 2: Value-Based Methods

In the second part, you'll learn how to leverage neural networks when solving complex problems using the Deep Q-Networks (DQN) algorithm. You will also learn about modifications such as double Q-learning, prioritized experience replay, and dueling networks. Then, you'll use what you’ve learned to create an artificially intelligent game-playing agent that can navigate a spaceship!

Use the DQN algorithm to train a spaceship to land safely on a planet.

Use the DQN algorithm to train a spaceship to land safely on a planet.

You'll learn from experts at NVIDIA's Deep Learning Institute how to apply your new skills to robotics applications. Using a Gazebo simulation, you will train a rover to navigate an environment without running into walls.

Learn from experts at NVIDIA how to navigate a rover!

Learn from experts at NVIDIA how to navigate a rover!

You'll also get the first project, where you'll write an algorithm that teaches an agent to navigate a large world.

In Project 1, you will train an agent to collect yellow bananas while avoiding blue bananas.

In Project 1, you will train an agent to collect yellow bananas while avoiding blue bananas.

All of the projects in this Nanodegree program use the rich simulation environments from the Unity Machine Learning Agents (ML-Agents) software development kit (SDK). You will learn more about ML-Agents in the next concept.

## Part 3: Policy-Based Methods

In the third part, you'll learn about policy-based and actor-critic methods such as Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), and Deep Deterministic Policy Gradients (DDPG). You’ll also learn about optimization techniques such as evolution strategies and hill climbing.

Use Deep Deterministic Policy Gradients (DDPG) to train a robot to walk.

Use Deep Deterministic Policy Gradients (DDPG) to train a robot to walk.

You'll learn from experts at NVIDIA about the active research that they are doing, to determine how to apply deep reinforcement learning techniques to finance. In particular, you'll explore an algorithm for optimal execution of portfolio transactions.

You'll also get the second project, where you'll write an algorithm to train a robotic arm to reach moving target positions.

In Project 2, you will train a robotic arm to reach target locations.

In Project 2, you will train a robotic arm to reach target locations.

## Part 4: Multi-Agent Reinforcement Learning

Most of reinforcement learning is concerned with a single agent that seeks to demonstrate proficiency at a single task. In this agent's environment, there are no other agents. However, if we'd like our agents to become truly intelligent, they must be able to communicate with and learn from other agents. In the final part of this nanodegree, we will extend the traditional framework to include multiple agents.

You'll also learn all about Monte Carlo Tree Search (MCTS) and master the skills behind DeepMind's AlphaZero.

Use Monte Carlo Tree Search to play Connect 4. ([Source](https://github.com/Alfo5123/Connect4))

Use Monte Carlo Tree Search to play Connect 4. (Source)

You'll also get the third project, where you'll write an algorithm to train a pair of agents to play tennis.

In Project 3, you will train a pair of agents to play tennis. ([Source](https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/))

In Project 3, you will train a pair of agents to play tennis. (Source)